Skip to content

feat: sandcastle refinement loop with critic-based convergence#111

Merged
jerome-benoit merged 32 commits into
mainfrom
feat/sandcastle-refinement-loop
May 5, 2026
Merged

feat: sandcastle refinement loop with critic-based convergence#111
jerome-benoit merged 32 commits into
mainfrom
feat/sandcastle-refinement-loop

Conversation

@jerome-benoit
Copy link
Copy Markdown
Owner

@jerome-benoit jerome-benoit commented May 4, 2026

Description

Replace single-pass implement→review→merge with a modular iterative implement↔critic refinement loop. Each task gets its own parallel sandbox with convergence detection, quality ratchet, and automated PR creation.

Architecture

Planner (opus) → selects issues
  For each issue (parallel, max 3):
    Sandbox with implement↔critic loop:
      Implementer (sonnet) → codes + commits + pushes
      Critic (sonnet) → structured findings JSON (nonce-tagged)
      Dedup (context-hash) → convergence check
      Quality ratchet → rollback on regression
      Best-state checkpoint → restore optimal intermediate
      Validation-in-loop (ARCS) → deterministic convergence
    Finalize: validate → rebase → PR (draft if non-converged)

Key Design Decisions

  • Flat iteration budget (50/round) — evidence: ARCS, SWE-Agent, AutoCodeRover all use flat
  • Context-hash dedup (±3 lines SHA-256) — drift-safe, CodeQL/Qodana pattern
  • Severity-weighted convergence — refuses convergence if CRITICAL/HIGH persist (OpenHands)
  • Best-state tracking — resets to best intermediate on non-convergence (SWE-Agent)
  • Validation-in-loop — deterministic convergence when tests pass (ARCS)
  • Async subprocess executionutil.promisify(execFile) unblocks event loop for true parallelism
  • Nonce-tagged critic output — prevents injection from code content
  • One PR per task — no batch merge, each issue gets its own PR

Modules

File Lines Responsibility
constants.ts 64 Shared constants + execFileAsync + getHeadSha + toErrorMessage
types.ts 83 Zod schemas + exported interfaces + parseFindingsSafe
concurrency-pool.ts 69 O(1) FIFO semaphore (linked list)
task-source.ts 248 TaskSource interface + GithubIssueSource (fetch + sanitize + plan)
refinement-loop.ts 580 Core loop: implement↔critic + dedup + ratchet + convergence
finalizer.ts 281 Validate + retry + rebase + push + PR creation
main.ts 103 Thin orchestrator: discover → pool → loop → finalize

Prompts

Prompt Role Key rules
plan-prompt.md Issue selection Prefer single-file scope, exclude blocked
implement-prompt.md Code + commit + push Cross-validate findings, full validation before push
critic-prompt.md Structured review ≤5 HIGH/CRIT findings, nonce-tagged JSON, known decisions blocklist

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Refactoring (no functional changes)

Checklist

  • I have run npm run type-check && npm run test && npm run prettier-check && npm run lint
  • I have run npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2
  • My changes follow the existing code style
  • E2E tested locally (planner + parallel implementers started successfully)

Related Issues

Fixes #110

Replace single-pass implement→review→merge with iterative implement↔critic
loop per task. Key changes:

- Orchestrator fetches and sanitizes issues (prevents prompt injection)
- Implement↔Critic loop with deterministic dedup convergence
- Critic produces structured findings (nonce-tagged JSON, zod-validated)
- Decreasing iteration budget per round [100, 50, 25, 10, 10]
- Host-side validation and rebase (no agent needed)
- One PR per task (no merger agent)
- Draft PR on non-convergence with outstanding findings listed

Implements #110
Copilot AI review requested due to automatic review settings May 4, 2026 23:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Sandcastle automation workflow to replace the prior single-pass implement→review→merge flow with an iterative implement↔critic refinement loop, aiming for deterministic convergence based on deduplicated structured findings, followed by host-side validation and PR creation.

Changes:

  • Pre-fetch and sanitize GitHub issues in the orchestrator, passing issue data into the planner/implementer prompts instead of shell-expanding gh calls inside prompts.
  • Add a new Critic agent prompt + parsing/dedup logic to iterate implement→critic rounds until no new findings are produced (or a hard cap is reached).
  • Remove the separate review/merge prompt phases and move validation + PR creation to host-side execSync calls.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
.sandcastle/main.ts Implements the implement↔critic loop, findings parsing/dedup, host-side validation, rebase/push, and PR creation.
.sandcastle/plan-prompt.md Switches planner input from gh issue list ... shell expansion to injected {{ISSUES_JSON}}.
.sandcastle/implement-prompt.md Switches issue input to injected {{ISSUE_BODY}} and adds {{FINDINGS}} as refinement input.
.sandcastle/critic-prompt.md New prompt defining nonce-tagged JSON findings output for the critic agent.
.sandcastle/review-prompt.md Removed (review agent phase eliminated).
.sandcastle/merge-prompt.md Removed (merge agent phase eliminated).

Comment thread .sandcastle/main.ts Outdated
Comment on lines 174 to 181
@@ -115,99 +180,186 @@ for (let iteration = 1; iteration <= MAX_PLANNER_RETRIES; iteration++) {
sandbox: docker({ imageName: DOCKER_IMAGE }),
});
Comment thread .sandcastle/main.ts Outdated
function parseFindings(stdout: string, nonce: string): Finding[] | null {
const tagPattern = new RegExp(`<findings-${nonce}>([\\s\\S]*?)<\\/findings-${nonce}>`, "g");
const matches = [...stdout.matchAll(tagPattern)];
const raw = matches.at(-1)?.[1]?.trim() ?? "[]";
Comment thread .sandcastle/main.ts Outdated
Comment on lines +329 to +336
const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;

try {
execSync(
`gh pr create${draftFlag} --head "${issue.branch}" --base main --title "${prTitle}" --body "${prBody.replace(/"/g, '\\"')}"`,
{ cwd, stdio: "pipe" },
);
Comment thread .sandcastle/main.ts Outdated
Comment on lines +328 to +330

const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;
Comment thread .sandcastle/main.ts Outdated
Comment on lines 115 to 129
let issues: { body: string; branch: string; id: string; title: string }[];
try {
const parsed = JSON.parse(planContent) as { issues: unknown[] };
if (!Array.isArray(parsed.issues)) {
console.error("Planner output missing issues array. Skipping iteration.");
console.error("Planner output missing issues array. Retrying.");
continue;
}
const validated = parsed.issues.filter(
(entry): entry is { branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) {
console.warn(" Skipping non-object issue entry");
return false;
}
(entry): entry is { body: string; branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) return false;
const item = entry as Record<string, unknown>;
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
console.warn(` Skipping issue with invalid id: ${String(item.id)}`);
return false;
}
if (typeof item.branch !== "string") {
console.warn(" Skipping issue with missing branch");
return false;
}
if (typeof item.title !== "string") {
console.warn(" Skipping issue with missing title");
return false;
}
if (!BRANCH_PATTERN.test(item.branch)) {
console.warn(` Skipping issue with invalid branch: ${item.branch}`);
return false;
}
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
if (typeof item.title !== "string") return false;
return true;
…ting, zod validation)

- Replace execSync with execFileSync for gh pr create (prevents shell injection)
- Guard parseFindings against empty matches (prevents false convergence)
- Add try/catch on gh issue list startup call
- Guard git push in rebase catch block
- Extract finalizeIssue function (reduces nesting from 6+ to 3 levels)
- Add zod schema for rawIssues (replaces unsafe 'as' cast)
- Implement validation retry round per spec (one more implement→critic if budget remains)
Copilot AI review requested due to automatic review settings May 4, 2026 23:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Comment thread .sandcastle/main.ts Outdated
Comment on lines +77 to +97
// --- Validation retry round (fix #7) ---
if (!validationPassed && round < MAX_CRITIC_ROUNDS) {
const retryBudget = ITERATION_BUDGET[MAX_CRITIC_ROUNDS - 1] ?? 10;
console.log(
` #${issue.id}: Retrying one more implement→critic round (budget: ${String(retryBudget)})`,
);

try {
await sandbox.run({
agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
maxIterations: retryBudget,
name: `Implementer #${issue.id} retry`,
promptArgs: {
BRANCH: issue.branch,
FINDINGS: lastFindings.length > 0 ? JSON.stringify(lastFindings, null, 2) : "",
ISSUE_BODY: issue.body,
ISSUE_TITLE: issue.title,
TASK_ID: issue.id,
},
promptFile: "./.sandcastle/implement-prompt.md",
});
Comment thread .sandcastle/main.ts Outdated
Comment on lines +296 to +317
let issues: { body: string; branch: string; id: string; title: string }[];
try {
const parsed = JSON.parse(planContent) as { issues: unknown[] };
if (!Array.isArray(parsed.issues)) {
console.error("Planner output missing issues array. Skipping iteration.");
console.error("Planner output missing issues array. Retrying.");
continue;
}
const validated = parsed.issues.filter(
(entry): entry is { branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) {
console.warn(" Skipping non-object issue entry");
return false;
}
(entry): entry is { body: string; branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) return false;
const item = entry as Record<string, unknown>;
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
console.warn(` Skipping issue with invalid id: ${String(item.id)}`);
return false;
}
if (typeof item.branch !== "string") {
console.warn(" Skipping issue with missing branch");
return false;
}
if (typeof item.title !== "string") {
console.warn(" Skipping issue with missing title");
return false;
}
if (!BRANCH_PATTERN.test(item.branch)) {
console.warn(` Skipping issue with invalid branch: ${item.branch}`);
return false;
}
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
if (typeof item.title !== "string") return false;
return true;
},
);
issues = validated;
// Attach sanitized body from our fetched data
issues = validated.map((v) => ({
...v,
body: issuesJson.find((i) => String(i.number) === v.id)?.body ?? "",
}));
Comment thread .sandcastle/main.ts Outdated
try {
execSync(
"npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2",
{ cwd, stdio: "pipe" },
Comment thread .sandcastle/main.ts Outdated
Comment on lines +67 to +75
try {
execSync(
"npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2",
{ cwd, stdio: "pipe" },
);
validationPassed = true;
} catch {
console.warn(` #${issue.id}: Validation failed.`);
}
Comment thread .sandcastle/main.ts Outdated
Comment on lines +172 to +174

const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;
Comment thread .sandcastle/main.ts Outdated
Comment on lines +173 to +174
const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;
Comment thread .sandcastle/main.ts Outdated
Comment on lines +118 to +150
// Rebase on latest main
let rebaseSucceeded = false;
try {
execSync("git fetch origin main && git rebase origin/main", {
cwd,
stdio: "pipe",
});
rebaseSucceeded = true;
if (validationPassed) {
// Post-rebase smoke test
try {
execSync("npm run type-check && npm run test", {
cwd,
stdio: "pipe",
});
} catch {
validationPassed = false;
}
}
} catch {
// Rebase failed — abort and push un-rebased
try {
execSync("git rebase --abort", { cwd, stdio: "pipe" });
} catch {
/* empty */
}
try {
execSync("git push", { cwd, stdio: "pipe" });
} catch (pushErr: unknown) {
const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
console.warn(` #${issue.id}: git push failed after rebase abort: ${pushMsg}`);
}
}
Comment thread .sandcastle/main.ts Outdated
Comment on lines +234 to +240
/**
* @param text - Raw text to strip injection-prone tags from.
* @returns Sanitized text safe for prompt injection.
*/
function sanitizeForPrompt(text: string): string {
return text.replace(/<\/?(?:plan|findings[\w-]*|promise)[^>]*>/gi, "");
}
Copilot AI review requested due to automatic review settings May 5, 2026 00:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comment thread .sandcastle/main.ts Outdated
Comment on lines +462 to +465
converged = false;
} else {
converged = true;
}
Comment thread .sandcastle/main.ts Outdated
Comment on lines +312 to +326
const validated = parsed.issues.filter(
(entry): entry is { branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) {
console.warn(" Skipping non-object issue entry");
return false;
}
(entry): entry is { body: string; branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) return false;
const item = entry as Record<string, unknown>;
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
console.warn(` Skipping issue with invalid id: ${String(item.id)}`);
return false;
}
if (typeof item.branch !== "string") {
console.warn(" Skipping issue with missing branch");
return false;
}
if (typeof item.title !== "string") {
console.warn(" Skipping issue with missing title");
return false;
}
if (!BRANCH_PATTERN.test(item.branch)) {
console.warn(` Skipping issue with invalid branch: ${item.branch}`);
return false;
}
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
if (typeof item.title !== "string") return false;
return true;
},
);
issues = validated;
// Attach sanitized body from our fetched data
issues = validated.map((v) => ({
...v,
body: issuesJson.find((i) => String(i.number) === v.id)?.body ?? "",
}));
Comment thread .sandcastle/main.ts Outdated
Comment on lines +70 to +78
try {
execSync(
"npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2",
{ cwd, stdio: "pipe" },
);
validationPassed = true;
} catch {
console.warn(` #${issue.id}: Validation failed.`);
}
Comment thread .sandcastle/main.ts Outdated
Comment on lines +148 to +151
execSync("git push", { cwd, stdio: "pipe" });
} catch (pushErr: unknown) {
const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
console.warn(` #${issue.id}: git push failed after rebase abort: ${pushMsg}`);
Comment thread .sandcastle/main.ts Outdated
if (!validationPassed && round < MAX_CRITIC_ROUNDS) {
const retryBudget = ITERATION_BUDGET[MAX_CRITIC_ROUNDS - 1] ?? 10;
console.log(
` #${issue.id}: Retrying one more implement→critic round (budget: ${String(retryBudget)})`,
Copilot AI review requested due to automatic review settings May 5, 2026 00:17
@jerome-benoit jerome-benoit force-pushed the feat/sandcastle-refinement-loop branch from f489cea to ee43546 Compare May 5, 2026 00:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread .sandcastle/main.ts Outdated
Comment on lines +475 to +478
converged = false;
} else {
converged = true;
}
Comment thread .sandcastle/main.ts Outdated
Comment on lines 323 to 331
const validated = parsed.issues.filter(
(entry): entry is { branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) {
console.warn(" Skipping non-object issue entry");
return false;
}
(entry): entry is { body: string; branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) return false;
const item = entry as Record<string, unknown>;
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
console.warn(` Skipping issue with invalid id: ${String(item.id)}`);
return false;
}
if (typeof item.branch !== "string") {
console.warn(" Skipping issue with missing branch");
return false;
}
if (typeof item.title !== "string") {
console.warn(" Skipping issue with missing title");
return false;
}
if (!BRANCH_PATTERN.test(item.branch)) {
console.warn(` Skipping issue with invalid branch: ${item.branch}`);
return false;
}
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
if (typeof item.title !== "string") return false;
return true;
},
Comment thread .sandcastle/main.ts Outdated
Comment on lines +187 to +188
const prTitle = `${commitPrefix}: resolve #${issue.id} — ${issue.title}`;
const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n${validationCheck} I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;
Split main.ts (525 lines) into 6 self-contained modules:
- types.ts: shared domain types (TaskSpec, Finding, LoopResult, FinalizeResult)
- refinement-loop.ts: reusable implement↔critic loop engine
- finalizer.ts: validation, rebase, PR creation
- concurrency-pool.ts: semaphore utility
- task-source.ts: TaskSource interface + GithubIssueSource
- main.ts: 74-line thin orchestrator wiring all modules

The refinement loop is now reusable by any task source (GitHub issues,
CI failures, manual triggers) without coupling to the planner.
Copilot AI review requested due to automatic review settings May 5, 2026 12:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +97 to +102
if (nonLowFindings.length > 0) {
lastFindings = nonLowFindings;
status = "exhausted";
} else {
status = "converged";
}
Comment on lines +123 to +127
try {
rawIssuesJson = execSync(
`gh issue list --state open --json number,title,labels,body --limit 50 --label "${this.label}"`,
{ encoding: "utf-8" },
);
Comment thread .sandcastle/task-source.ts Outdated
Comment on lines +159 to +175
const validated = parsed.issues.filter(
(entry): entry is { body: string; branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) return false;
const item = entry as Record<string, unknown>;
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
if (typeof item.branch !== "string" || !this.branchPattern.test(item.branch))
return false;
if (typeof item.title !== "string") return false;
return true;
},
);

return validated.map((v) => ({
...v,
body: issuesJson.find((i) => String(i.number) === v.id)?.body ?? "",
labels: issuesJson.find((i) => String(i.number) === v.id)?.labels ?? [],
}));
return tasks;
}

return [];
Copilot AI review requested due to automatic review settings May 5, 2026 12:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +100 to +108
if (newFindings.length === 0) {
const nonLowFindings = findings.filter((f) => f.confidence !== "LOW");
if (nonLowFindings.length > 0) {
lastFindings = nonLowFindings;
status = "exhausted";
} else {
status = "converged";
}
break;
Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +72 to +73
status = "exhausted";
break;
Comment on lines +7 to +8
Run `git diff main...{{BRANCH}}` to see all changes. Examine the diff carefully. For each issue found, produce a structured finding.

Comment on lines +171 to +180
const validated = parsed.issues.filter(
(entry): entry is { body: string; branch: string; id: string; title: string } => {
if (typeof entry !== "object" || entry === null) return false;
const item = entry as Record<string, unknown>;
if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
if (typeof item.branch !== "string" || !this.branchPattern.test(item.branch))
return false;
if (typeof item.title !== "string") return false;
return true;
},
Comment on lines +6 to +12
private running = 0;

/**
* @param max - Maximum number of concurrent tasks.
*/
constructor(private readonly max: number) {}

Comment thread .sandcastle/finalizer.ts
Comment on lines +38 to +75
if (!validationPassed && loopResult.roundsCompleted < MAX_CRITIC_ROUNDS) {
const retryBudget = ITERATION_BUDGET[MAX_CRITIC_ROUNDS - 1] ?? 10;
console.log(
` #${spec.id}: Retrying one more implement round (budget: ${String(retryBudget)})`,
);

try {
await sandbox.run({
agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
maxIterations: retryBudget,
name: `Implementer #${spec.id} retry`,
promptArgs: {
BRANCH: spec.branch,
FINDINGS:
loopResult.lastFindings.length > 0
? JSON.stringify(loopResult.lastFindings, null, 2)
: "",
ISSUE_BODY: spec.body,
ISSUE_TITLE: spec.title,
TASK_ID: spec.id,
},
promptFile: "./.sandcastle/implement-prompt.md",
});
} catch (retryErr: unknown) {
const retryMsg = retryErr instanceof Error ? retryErr.message : String(retryErr);
console.warn(
` #${spec.id}: Implementer retry threw: ${retryMsg}. Falling through to PR creation.`,
);
}

try {
execSync(VALIDATION_COMMAND, { cwd, stdio: "pipe" });
validationPassed = true;
console.log(` #${spec.id}: Validation passed after retry round.`);
} catch {
console.warn(` #${spec.id}: Validation still fails after retry. Will create draft PR.`);
}
}
Comment on lines +195 to +202
/**
* Strips injection-prone tags from text.
* @param text - Raw text to sanitize.
* @returns Sanitized text safe for prompt injection.
*/
function sanitizeForPrompt(text: string): string {
return text.replace(/<\/?(?:plan|findings[\w-]*|promise)[^>]*>/gi, "");
}
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comment thread .sandcastle/main.ts
Comment on lines 29 to +69
const settled = await Promise.allSettled(
issues.map(async (issue) => {
await acquire();
try {
await using sandbox = await sandcastle.createSandbox({
branch: issue.branch,
copyToWorktree: ["node_modules"],
hooks: {
sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
},
sandbox: docker({ imageName: DOCKER_IMAGE }),
});
tasks.map((spec) =>
pool.run(() =>
Promise.race([
(async () => {
await using sandbox = await sandcastle.createSandbox({
branch: spec.branch,
copyToWorktree: ["node_modules"],
hooks: {
sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
},
sandbox: docker({ imageName: DOCKER_IMAGE }),
});

const result = await sandbox.run({
agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
maxIterations: 100,
name: "Implementer #" + issue.id,
promptArgs: {
BRANCH: issue.branch,
ISSUE_TITLE: issue.title,
TASK_ID: issue.id,
},
promptFile: "./.sandcastle/implement-prompt.md",
});
const loopResult = await runRefinementLoop(spec, sandbox, {
iterationBudget: ITERATION_BUDGET_PER_ROUND,
maxRounds: MAX_CRITIC_ROUNDS,
});

if (result.commits.length > 0) {
try {
await sandbox.run({
agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
maxIterations: 10,
name: "Reviewer #" + issue.id,
promptArgs: {
BRANCH: issue.branch,
},
promptFile: "./.sandcastle/review-prompt.md",
let prCreated = false;
if (loopResult.totalCommits > 0) {
const cwd = sandbox.worktreePath;
const result = await finalizeTask(spec, loopResult, sandbox, cwd);
prCreated = result.prCreated;
}

return { prCreated, spec };
})(),
(() => {
const p = new Promise<never>((_, reject) => {
setTimeout(() => {
reject(new Error(`Task #${spec.id} timed out after ${String(TASK_TIMEOUT_MS)}ms`));
}, TASK_TIMEOUT_MS).unref();
});
p.catch(() => {
/* suppress unhandled rejection when task completes before timeout */
});
} catch (reviewError: unknown) {
const msg = reviewError instanceof Error ? reviewError.message : String(reviewError);
console.warn(` Reviewer for #${issue.id} failed, proceeding unreviewed: ${msg}`);
}
}
return p;
})(),
]),
),
Comment thread .sandcastle/task-source.ts Outdated
const source = issueMap.get(v.id);
if (!source) return null;
return {
...v,
Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +256 to +272
function findingKey(f: Finding, cwd: string, fileCache?: Map<string, string>): string {
if (!f.file || f.line == null) {
const normalizedTitle = f.title
.toLowerCase()
.replace(/[^\w\s]/g, "")
.replace(/\s+/g, " ")
.trim();
const titleHash = crypto
.createHash("sha256")
.update(normalizedTitle)
.digest("hex")
.slice(0, 16);
return `${f.file || "global"}::${f.category}::${titleHash}`;
}
const contextHash = hashContextLines(cwd, f.file, f.line, 3, fileCache);
return `${f.file}::${f.category}::${contextHash}`;
}
Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +144 to +148
// Validate SHA format before passing to execFileSync
if (!/^[0-9a-f]{40}$/.test(beforeSha)) {
console.warn(` #${spec.id}: Invalid SHA for rollback, skipping reset.`);
return true;
}
Comment thread .sandcastle/finalizer.ts Outdated
Comment on lines +204 to +226
function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): boolean {
if (rebaseSucceeded) {
try {
execFileSync("git", ["push", "--force-with-lease"], { cwd, stdio: "pipe" });
return true;
} catch (pushErr: unknown) {
const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
try {
const suffix = crypto.randomBytes(4).toString("hex");
execFileSync("git", ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`], {
cwd,
stdio: "pipe",
});
console.warn(
` #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
);
} catch {
console.error(
` #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
);
}
return false;
}
Copilot AI review requested due to automatic review settings May 5, 2026 16:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

function sanitizeForPrompt(text: string): string {
const normalized = text.normalize("NFKC");
return normalized.replace(
/<\/?(?:plan|findings|promise|system|code|instructions|implement|review|tool_call)[^>]*>/gi,
Comment thread .sandcastle/types.ts
Comment on lines +64 to +72
/** Maximum implement↔critic rounds before giving up. */
export const MAX_CRITIC_ROUNDS = 5;

/**
* Flat iteration budget per round (intentionally constant, not decreasing).
* Evidence: ARCS (arXiv:2504.20434), SWE-Agent, AutoCodeRover all use flat budgets.
* Decreasing schedules penalize harder residual problems in later rounds.
*/
export const ITERATION_BUDGET_PER_ROUND = 50;
Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +249 to +272
/**
* Computes a deduplication key for a finding using a context hash of surrounding lines.
* @param f - Finding to compute a key for.
* @param cwd - Working directory (worktree path) for reading file context.
* @param fileCache - Optional cache of file contents keyed by resolved path.
* @returns Composite dedup key.
*/
function findingKey(f: Finding, cwd: string, fileCache?: Map<string, string>): string {
if (!f.file || f.line == null) {
const normalizedTitle = f.title
.toLowerCase()
.replace(/[^\w\s]/g, "")
.replace(/\s+/g, " ")
.trim();
const titleHash = crypto
.createHash("sha256")
.update(normalizedTitle)
.digest("hex")
.slice(0, 16);
return `${f.file || "global"}::${f.category}::${titleHash}`;
}
const contextHash = hashContextLines(cwd, f.file, f.line, 3, fileCache);
return `${f.file}::${f.category}::${contextHash}`;
}
Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +132 to +163
function checkQualityRatchet(
spec: TaskSpec,
round: number,
findingsCount: number,
previousCount: number,
beforeSha: string,
cwd: string,
): boolean {
if (round <= 2 || findingsCount <= previousCount) {
return false;
}

// Validate SHA format before passing to execFileSync
if (!/^[0-9a-f]{40}$/.test(beforeSha)) {
console.warn(` #${spec.id}: Invalid SHA for rollback, skipping reset.`);
return true;
}

try {
execFileSync("git", ["reset", "--hard", beforeSha], {
cwd,
stdio: "pipe",
});
console.warn(
` #${spec.id} R${String(round)}: Regression detected (${String(previousCount)} → ${String(findingsCount)}). Rolled back.`,
);
} catch {
console.warn(` #${spec.id}: Failed to reset to ${beforeSha} after regression.`);
}

return true;
}
Comment on lines +174 to +179
const newFindings = findings.filter(
(f) => f.confidence !== "LOW" && !seenKeys.has(findingKey(f, cwd, fileCache)),
);
for (const f of newFindings) {
seenKeys.add(findingKey(f, cwd, fileCache));
}
Comment thread .sandcastle/main.ts Outdated
Comment on lines +29 to +68
const settled = await Promise.allSettled(
issues.map(async (issue) => {
await acquire();
try {
await using sandbox = await sandcastle.createSandbox({
branch: issue.branch,
copyToWorktree: ["node_modules"],
hooks: {
sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
},
sandbox: docker({ imageName: DOCKER_IMAGE }),
});
tasks.map((spec) =>
pool.run(() =>
Promise.race([
(async () => {
await using sandbox = await sandcastle.createSandbox({
branch: spec.branch,
copyToWorktree: ["node_modules"],
hooks: {
sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
},
sandbox: docker({ imageName: DOCKER_IMAGE }),
});

const result = await sandbox.run({
agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
maxIterations: 100,
name: "Implementer #" + issue.id,
promptArgs: {
BRANCH: issue.branch,
ISSUE_TITLE: issue.title,
TASK_ID: issue.id,
},
promptFile: "./.sandcastle/implement-prompt.md",
});
const loopResult = await runRefinementLoop(spec, sandbox, {
iterationBudget: ITERATION_BUDGET_PER_ROUND,
maxRounds: MAX_CRITIC_ROUNDS,
});

if (result.commits.length > 0) {
try {
await sandbox.run({
agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
maxIterations: 10,
name: "Reviewer #" + issue.id,
promptArgs: {
BRANCH: issue.branch,
},
promptFile: "./.sandcastle/review-prompt.md",
let prCreated = false;
if (loopResult.totalCommits > 0) {
const cwd = sandbox.worktreePath;
const result = await finalizeTask(spec, loopResult, sandbox, cwd);
prCreated = result.prCreated;
}

return { prCreated, spec };
})(),
(() => {
const p = new Promise<never>((_, reject) => {
setTimeout(() => {
reject(new Error(`Task #${spec.id} timed out after ${String(TASK_TIMEOUT_MS)}ms`));
}, TASK_TIMEOUT_MS).unref();
});
p.catch(() => {
/* suppress unhandled rejection when task completes before timeout */
});
} catch (reviewError: unknown) {
const msg = reviewError instanceof Error ? reviewError.message : String(reviewError);
console.warn(` Reviewer for #${issue.id} failed, proceeding unreviewed: ${msg}`);
}
}
return p;
})(),
]),
Comment thread .sandcastle/finalizer.ts Outdated
Comment on lines +204 to +231
function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): boolean {
if (rebaseSucceeded) {
try {
execFileSync("git", ["push", "--force-with-lease"], { cwd, stdio: "pipe" });
return true;
} catch (pushErr: unknown) {
const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
try {
const suffix = crypto.randomBytes(4).toString("hex");
execFileSync("git", ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`], {
cwd,
stdio: "pipe",
});
console.warn(
` #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
);
} catch {
console.error(
` #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
);
}
return false;
}
} else {
try {
execFileSync("git", ["push"], { cwd, stdio: "pipe" });
return true;
} catch (pushErr: unknown) {
Comment thread .sandcastle/types.ts
export type LoopStatus = "converged" | "exhausted" | "failed" | "skipped";

/** Type alias for a sandcastle sandbox instance. */
export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;
…nHands)

- Validation in-loop (ARCS): deterministic convergence when tests pass mid-loop
- Best-state checkpoint (SWE-Agent): reset to best SHA on non-convergence
- Severity-weighted convergence (OpenHands): refuse convergence if CRITICAL/HIGH persist
Copilot AI review requested due to automatic review settings May 5, 2026 16:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- New constants.ts: shared constants (VALIDATION_COMMAND, timeouts, model names) + utilities (getHeadSha, toErrorMessage)
- refinement-loop.ts: decompose runRefinementLoop (CC 17→≤10), RoundContext/HashInput param objects, computeFindingKey rename
- finalizer.ts: add timeouts to all execFileSync, use runValidation helper consistently
- task-source.ts: add timeout, replace char loop with regex, fix terse names
- main.ts: extract withTimeout helper, use model constants
- types.ts: unexport FindingsSchema (internal only)
Copilot AI review requested due to automatic review settings May 5, 2026 18:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Comment thread .sandcastle/types.ts
export type LoopStatus = "converged" | "exhausted" | "failed" | "skipped";

/** Type alias for a sandcastle sandbox instance. */
export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;
Comment on lines +445 to +451
.update(`${file}:${String(line)}:${normalized}`)
.digest("hex")
.slice(0, HASH_PREFIX_LENGTH);
} catch {
return crypto
.createHash("sha256")
.update(`${file}:${String(line)}:fallback`)
Comment on lines +463 to +479
function parseFindings(stdout: string, nonce: string): Finding[] | null {
if (!/^[0-9a-f]+$/.test(nonce)) return null;
const tagPattern = new RegExp(`<findings-${nonce}>([\\s\\S]*?)<\\/findings-${nonce}>`, "g");
const matches = [...stdout.matchAll(tagPattern)];
if (matches.length === 0) return null;
// Find last non-trivial match
for (let i = matches.length - 1; i >= 0; i--) {
const raw = matches[i]?.[1]?.trim() ?? "";
if (raw.length < 2) continue;
const cleaned = raw.replace(/^```(?:json)?\s*\n?/g, "").replace(/\n?```\s*$/g, "");
try {
return parseFindingsSafe(JSON.parse(cleaned));
} catch {
continue;
}
}
return null;
Comment thread .sandcastle/refinement-loop.ts Outdated
Comment on lines +277 to +281
// Validate SHA format before passing to execFileSync
if (!/^[0-9a-f]{40}$/.test(beforeSha)) {
console.warn(` #${spec.id}: Invalid SHA for rollback, skipping reset.`);
return true;
}
Comment thread .sandcastle/finalizer.ts Outdated
Comment on lines +221 to +247
function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): boolean {
if (rebaseSucceeded) {
try {
execFileSync("git", ["push", "--force-with-lease"], {
cwd,
stdio: "pipe",
timeout: PUSH_TIMEOUT_MS,
});
return true;
} catch (pushErr: unknown) {
const pushMsg = toErrorMessage(pushErr);
try {
const suffix = crypto.randomBytes(4).toString("hex");
execFileSync("git", ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`], {
cwd,
stdio: "pipe",
timeout: PUSH_TIMEOUT_MS,
});
console.warn(
` #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
);
} catch {
console.error(
` #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
);
}
return false;
Comment on lines +124 to +131
if (result.findings === null) break;
const findings: Finding[] = result.findings;

if (result.commits > 0 && runMidLoopValidation(sandbox.worktreePath)) {
totalCommits += result.commits;
status = "converged";
break;
}
…ck event loop)

Replace all blocking execFileSync calls with util.promisify(execFile) to enable
true parallelism between tasks during subprocess execution.

- constants.ts: add execFileAsync export, convert getHeadSha to async
- refinement-loop.ts: captureHeadSha, checkQualityRatchet, checkConvergence,
  runMidLoopValidation, resetToBestState all async
- finalizer.ts: runValidation, attemptRebase, pushBranch all async
- task-source.ts: fetchAndSanitizeIssues async

readFileSync/realpathSync stay sync (<1ms local I/O, no benefit from async).
maxBuffer: 8MB added to validation and gh issue list calls.
Copilot AI review requested due to automatic review settings May 5, 2026 20:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Comment thread .sandcastle/types.ts
Comment on lines +45 to +46
/** Type alias for a sandcastle sandbox instance. */
export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;
Comment thread .sandcastle/types.ts
Comment on lines +64 to +69
/**
* Flat iteration budget per round (intentionally constant, not decreasing).
* Evidence: ARCS (arXiv:2504.20434), SWE-Agent, AutoCodeRover all use flat budgets.
* Decreasing schedules penalize harder residual problems in later rounds.
*/
export const ITERATION_BUDGET_PER_ROUND = 50;
Comment on lines +152 to +176
if (newFindings.length < bestFindingsCount) {
bestFindingsCount = newFindings.length;
bestSha = await captureHeadSha(cwd);
}

totalCommits += result.commits;
previousFindingsCount = nonLowFindings.length;
onRoundComplete(round, findings);

const convergenceResult = await checkConvergence(cwd, findings, newFindings, nonLowFindings);
if (convergenceResult !== null) {
lastFindings = convergenceResult.lastFindings;
status = convergenceResult.status;
bestSha = convergenceResult.bestSha;
break;
}

lastFindings = newFindings;
}

if (shouldResetToBest(status, bestSha)) {
totalCommits = await resetToBestState(sandbox.worktreePath, bestSha, totalCommits);
}

return { lastFindings, roundsCompleted, status, totalCommits };
Comment on lines +368 to +401
// Implementer
let implementerResult: Awaited<ReturnType<typeof sandbox.run>>;
try {
implementerResult = await sandbox.run({
agent: sandcastle.opencode(AGENT_MODEL),
maxIterations: budget,
name: `Implementer #${spec.id} R${String(round)}`,
promptArgs: {
BRANCH: spec.branch,
FINDINGS: findingsArg,
ISSUE_BODY: spec.body,
ISSUE_TITLE: spec.title,
TASK_ID: spec.id,
},
promptFile: "./.sandcastle/implement-prompt.md",
});
} catch (err: unknown) {
const msg = err instanceof Error ? (err.stack ?? err.message) : String(err);
console.error(` #${spec.id} R${String(round)}: Implementer threw: ${msg}`);
return { beforeSha, commits: 0, findings: null };
}

// Critic
const nonce = crypto.randomBytes(4).toString("hex");
let findings: Finding[] | null;
try {
findings = await runCritic(sandbox, spec, round, nonce);
} catch (err: unknown) {
const msg = err instanceof Error ? err.message : String(err);
console.error(` #${spec.id} R${String(round)}: Critic threw: ${msg}`);
findings = null;
}

return { beforeSha, commits: implementerResult.commits.length, findings };
return { status: "skipped", totalCommits };
}
if (result.findings === null) {
console.warn(` #${spec.id}: Critic failed twice. Breaking (non-converged).`);
Comment thread .sandcastle/finalizer.ts
Comment on lines +220 to +249
async function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): Promise<boolean> {
if (rebaseSucceeded) {
try {
await execFileAsync("git", ["push", "--force-with-lease"], {
cwd,
timeout: PUSH_TIMEOUT_MS,
});
return true;
} catch (pushErr: unknown) {
const pushMsg = toErrorMessage(pushErr);
try {
const suffix = crypto.randomBytes(4).toString("hex");
await execFileAsync(
"git",
["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`],
{
cwd,
timeout: PUSH_TIMEOUT_MS,
},
);
console.warn(
` #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
);
} catch {
console.error(
` #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
);
}
return false;
}
@jerome-benoit jerome-benoit changed the title feat: implement sandcastle refinement loop with critic-based convergence feat: sandcastle refinement loop with critic-based convergence May 5, 2026
Copilot AI review requested due to automatic review settings May 5, 2026 20:40
@jerome-benoit jerome-benoit merged commit b6b7db6 into main May 5, 2026
9 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Comment thread .sandcastle/types.ts
export type LoopStatus = "converged" | "exhausted" | "failed" | "skipped";

/** Type alias for a sandcastle sandbox instance. */
export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;
Comment on lines +244 to +251
if (round === 1 && result.commits === 0) {
console.warn(` #${spec.id}: 0 commits on round 1. Skipping.`);
return { status: "skipped", totalCommits };
}
if (result.findings === null) {
console.warn(` #${spec.id}: Critic failed twice. Breaking (non-converged).`);
return { status: "failed", totalCommits: totalCommits + result.commits };
}
Comment on lines +152 to +153
if (newFindings.length < bestFindingsCount) {
bestFindingsCount = newFindings.length;

if (result.commits > 0 && (await runMidLoopValidation(sandbox.worktreePath))) {
totalCommits += result.commits;
status = "converged";
Comment thread .sandcastle/finalizer.ts
validationPassed: boolean,
rebaseSucceeded: boolean,
): { isDraft: boolean; prArgs: string[] } {
const converged = loopResult.status === "converged";
Comment thread .sandcastle/finalizer.ts
Comment on lines +222 to +226
try {
await execFileAsync("git", ["push", "--force-with-lease"], {
cwd,
timeout: PUSH_TIMEOUT_MS,
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Implement/review refinement loop with deterministic convergence

2 participants